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A STATE-SPACE MIXED MEMBERSHIP BLOCKMODEL 
FOR DYNAMIC NETWORK TOMOGRAPHY 

By Wenjie Fu, Le Song and Eric P. Xing * 

School of Computer Science, Carnegie Mellon University 

In a dynamic social or biological environment, the interactions 
between the underlying actors can undergo large and systematic 
changes. The latent roles or membership of the actors as determined 
by these dynamic links will also exhibit rich temporal phenomena, 
assuming a distinct role at one point while leaning more towards 
a second role at an another point. To capture this dynamic mixed 
membership in rewiring networks, we propose a state space mixed 
membership stochastic blockmodel which embeds an actor into a la- 
tent space and track its mixed membership in the latent space across 
time. We derived efficient approximate learning and inference algo- 
rithms for our model, and applied the learned models to analyze a 
social network between monks, and a rewiring gene interaction net- 
work of Drosophila melanogaster collected during its full life cycle. 
In both cases, our model reveals interesting patterns of the dynamic 
roles of the actors. 

1. Introduction. Inference of network tomography - the semantic un- 
derpinnings of entities in both static and dynamic networks, is fundamental 
for understanding the organization and function of complex relational struc- 
tures in natural, sociocultural, and technological systems. In a social system 
such as a friendship network, network tomography can capture the latent 
social roles of individuals; inferring such roles based on the social interac- 
tions among individuals is fundamental for understanding the importance of 
members in a network, for interpreting the social structure of various com- 
munities in a network, and for modeling the behavioral, sociological, and 
epidemiological processes mediated by the vertices in a network. In systems 
biology, network tomography often translates to latent biochemical or ge- 
netic functions of interacting biological molecules such as proteins, mRNAs, 
or metabolites in a regulatory circuity; elucidating such functions based on 
the topology of molecular networks can advance our understanding of the 
mechanisms of how a complex biological system regulates itself and reacts to 
stimuli. More broadly, network tomography can lead to important insights 
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to the robustness of network structures and their vulnerabilities; the cause 
and consequence of information diffusion; and the mechanism of hierarchy 
and organization formation. By appropriately modeling network tomogra- 
phy, a network analyzer can also simulate and reason about the generative 
mechanisms of networks, and discover changing roles in networks, which will 
be relevant for activity and anomaly detection. 

In most real-world complex systems, for example, a genetic regulation 
network, the measurable attributes and relations of objects are often func- 
tions of latent temporal processes which can fluctuate, evolve, emerge and 
terminate stochastically. In the context of biological network modeling, this 
means that over the course of a cellular process, such as a cell cycle or an 
immune response, there may exist multiple underlying "themes" that de- 
termine the functionalities of each molecule and their relationships to each 
other, and such themes are dynamical and stochastic. As a result, the molec- 
ular networks at each time point are context-dependent and can undergo 
systematic rewiring, rather than being i.i.d. samples from a single under- 
lying distribution, as assumed in most current biological network studies. 
We are interested in understanding the mechanisms that drive the tempo- 
ral rewiring of biological networks during various cellular and physiological 
processes, and similar phenomena in time-varying social networks. 

A major limitation of most current methods for network modeling and 
inference (Hoff et al, 2002; Li and McCallum, 2006; Handcock et al., 2007) 
is that, they assume each actor (i.e., vertex), such as a social individual or a 
biological molecule in a network, undertakes a single and invariant role (or 
functionality, class label, etc., depending on the domain of interest), when 
interacting with other vertices. In many realistic social and biological sce- 
narios, every actor can play multiple roles (or under multiple influences) and 
the specific role being played dependents on whom the actor is interacting 
with; and the roles undertaken by an actor can change over time. In this 
paper, we propose a new Bayesian approach for role identification that will 
capture the multi-facet, context-specific, and temporal nature of an actor's 
role in large, heterogeneous, and evolving dynamic networks. The proposed 
method will build on a modified version of the mixed membership model 
formalism (Erosheva et al., 2004), which enables network links to be instan- 
tiated by role-specific local connection mechanisms, each link is underlined 
by a separately chosen latent functional cause, and supports analyzing pat- 
terns of interactions between actors via statistically inferring the embedding 
of a network in a latent "tomographic-space" . 

Modeling embedding of networks in latent state space offers an intuitive 
but powerful approach to infer the semantic underpinnings of each actor, 
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such as its biological or social roles or other entity functions, underlying the 
observed network topologies. Via such a model, one can map every actor 
in a network to a position in a low-dimensional Euclidean space, where 
the roles of the actors are reflected in the role or functional coordinates 
of the actors in the latent space and the relationships among actors are 
reflected in their Euclidian distances. We can naturally capture the dynamics 
of role evolution of actors in such a tomographic-space, and other latent 
dynamic processes driving the network evolution by furthermore applying a 
state-space models (SSMs) popular in object tracking over the positions of 
the "tomographic-embeddings" of all actors, where a logistic- normal mixed 
membership stochastic blockmodel (MMSB) is employed as the emission 
model to define time-specific condition likelihood of the observed networks 
over time. The resulting model should be formally known as a state space 
mixed membership stochastic blockmodel, but for simplicity in this paper we 
will refer to it as a dynamic MMSB (or in short, dMMSB); and we will show 
that this model allows one to infer the trajectory of the roles of each actor 
based on the posterior distribution of its role-vector. 

Given network data, the dMMSB can be learned based on maximal like- 
lihood principle using a variational EM algorithm (Ghahramani and Beal, 
2001; Xing et al., 2003; Ahmed and Xing, 2007), the resulting network pa- 
rameters reveal not only cluster membership information of each node over 
time, but also other interesting regularities in the network topology. We have 
applied this model to the well-known Sampson's monk social network and 
a sequence of time- varying genetic interaction networks estimated from the 
Drosophila genome-wise microarray time series, and we will present some 
previously unnoticed dynamic behaviors of network actors in these data. 

The remaining part of the paper is organized as follow. In Section 2, we 
first briefly review some related work. In Section 3 we present the dMMSB 
model in detail. The Laplace variational EM algorithm for approximate 
inference under dMMSB will be described in Section 4. In Section 4 we 
present two case studies on the monks network and the Drosophila genetic 
network using dMMSB, along with some simulation based validation of the 
model. Some discussions will be given in Section 5. Algebraic details of 
the derivations of the approximate inference algorithm are provided in the 
appendix. 

2. Related work. There is a vast and growing body of literature on 
model-based statistical analysis of network data, traditionally focusing on 
descriptive models such as the Exponential Random Graph (ERG) mod- 
els (Frank and Strauss, 1986; Wasserman and Pattison, 1996), and more re- 
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cently moving toward more generative types of models such as those that 
model the network structure as being caused by the actors' positions in a la- 
tent "social space" (Hoff et al., 2002; Hoff, 2003). Among these models, some 
variants of the ERGs, such as the stochastic block models (Holland et al., 
1983; Fienberg et al., 1985; Wasserman and Pattison, 1996; Snijders, 2002; 
Hoff et al., 2002), cluster network vertices based on their structural equiva- 
lency (Lorrain and White, 1971). The latent space models (LSM) instead 
project nodes onto a latent space, where their similarities can be visu- 
alized and explored (Hoff et al., 2002; Hoff, 2003). The mixed member- 
ship stochastic models proposed in Airoldi et al. (2005) integrate ideas from 
these models, but went further by allowing each node to belong to mul- 
tiple blocks (i.e., groups) with fractional membership. Variants of mixed 
membership models have appeared in population genetics (Pritchard et al., 
2000), text modeling (Blei et al., 2003a), analysis of multiple disability mea- 
sures (Erosheva and Fienberg, 2005), etc. In most of these cases, mixed 
membership models are used as a latent-space projection method to project 
high-dimensional attribute data into a lower-dimensional "aspect-space" , as 
a normalized mixed membership vector, which reflects the weight of each la- 
tent aspects (e.g., roles, functions, topics, etc.) associated with each object. 
The mixed membership vectors often serve as a surrogate of the original 
data for subsequent analysis such as classification (Blei et al., 2003b). The 
mixed membership stochastic model we developed earlier has been applied 
for functional prediction in protein-protein interaction networks and role 
identification in Sampson's 18-monk social network (Airoldi et al., 2005, 
2006). It uses the aforementioned mixed membership vector to define an 
actor-specific multinomial distribution, from which specific actor roles can 
be sampled when interacting with other actors. For each monk, it yields a 
multi-class social-identity prediction which captures the fact that his inter- 
actions with different other monks may be under different social contexts. 
For each gene, it yield a multi-class functional prediction which captures 
the fact that its interactions with different genes may be under different 
functional contexts. 

We intend to use the state space model popular in object tracking and 
trajectory modeling for inferring underlying functional changes in network 
elements (i.e., nodes), and sensing emergence and termination of "function 
themes" underlying network sequences. This scheme has been adopted in a 
number of recent work on extracting evolving topical themes in text docu- 
ments (Blei and Lafferty, 2006b; Wang and McCallum, 2006) or author em- 
beddings (Sarkar and Moore, 2005) based on author, text, and reference 
network of archived publications. 
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3. Modeling Dynamic Network Tomography. Consider a tempo- 
ral series of networks {G^,...,G^} over a vertex set V, where GW = 
{V,E®} represents the network observed at time t. In this paper, we as- 
sume that N = \V\ is invariant over time; thus = {efj}f j=1 denote the 
set of (possibly transient) links observed at time t between a fixed set of N 
vertices. 

To model both the multi-class nature of every vertex in a network, and the 
latent semantic characteristics of the vertex-classes and their relationships 
to inter-vertices interactions, we assume that at any time point, every vertex 
Vi € V in the network, such as a social actor or a biological molecule, can 
undertake multiple roles or functions instantiated from a predefined latent 
tomographic space according to a time- varying distribution pt(); and the 
weights (i.e., proportion of "contribution") of the involved roles/functions 
can be represented by a normalized vector 7?^ of fixed dimension K. We re- 
fer to each role, function, or other domain-specific semantics underlying the 
vertices as a membership of a latent class. Earlier stochastic blockmodel of 
networks restricted each vertex to belong to a single and invariant member- 
ship, which therefore leads to a (noisy) block matrix representation of the 
pairwise interaction matrix of the network after appropriate permutation of 
its rows and columns. Each block in this matrix corresponds to two groups 
(indexed by row and column indices, respectively) of interacting vertices; 
the probability of inter- vertex interaction is modeled by a block-specific (i.e., 
group-pair specific) "compatibility potential"; but each vertex can only bel- 
low to one block. In this paper, we assume that each vertex can have mixed 
memberships, that is, it can undertake multiple roles/functions within a 
single network when interacting with various network neighbors with dif- 
ferent roles/functions, and the proportions of the mixed-memberships iff 
can evolve over time. Furthermore, we assume that the links between nodes 
are instantiated stochastically according to a compatibility function over the 
roles undertaken by the vertex-pair in question, and we define the compat- 
ibility coefficients between all possible pair of roles using a time-evolving 
role-compatibility matrix = {/?£*;}■ 

3.1. Basic Mixed Membership Stochastic Blockmodel. Under a mixed mem- 
bership network model, as first proposed in Airoldi et al. (2005), network 
links can be instantiated by a role-specific local interaction mechanisms: the 
link between each pair of actors, say (i, j), is instantiated according to the 
latent role specifically undertaken by actor i when it is to interact with j, 
and also the latent role of j when it is to interact with i. More specifically, 
suppose that each different role-pair, say roles k and I, between actors has 
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a unique probability distribution P(-\(3k,i) of having a link between actor 
pairs with that role combination, then a basic mixed membership stochastic 
blockmodel (MMSB) posits the following generative scheme for a network: 

1. For each vertex i, draw the mixed-membership vector: 7?, ~ P( m \0) 

2. For each possible interacting vertex j of vertex i, draw the link indicator e i:j £ {0, 1} 
as follows: 

• draw latent roles Zi->j ~ Multinomial(-|7Ti, 1), ~ Multinomial(-|7fj, 1), 
where denotes the role of actor i when it is to interact with j, and zi^j 
denotes the role of actor j when it is approached by i. Here s^_,j and Zi^j 
are unit indictor vectors in which one element is one and the rest are zero; it 
represents the fc-th role if and only if the fc-th element of the vector is one, 
for example, Zi^ jik = 1 or Zj^ iik = 1. 

• and draw e i: j | (zi-, 3 -,k = l)Z 3 %-i,i = 1) ~ Bernoulli(-|/3 fc .i). 

Specifically, the generative model above defines a conditional probability 
distribution of the relations E = {e-ij} among vertices in a way that re- 
flects naturally interpretable latent semantics (e.g., roles, functions, cluster 
identities) of the vertices. The link e^j represents a binary actor-to-actor 
relationship. For example, the existence of a link could mean that a package 
has been sent from one person to another, or one has a positive impression 
on another, or one gene is regulated by another. Each vertex Vi is associated 
with a set of latent membership labels {i^— .} (if the links are undi- 
rected, as in PPI, then we can ignore the asymmetry of "— >" and "<—"). 
Thus the semantic underpinning of each interaction between vertices is cap- 
tured by a pair of instantiated memberships unique to this interaction; and 
the nature and strength of the interaction is controlled by the compatibility 
function determined by this pair of memberships instantiation. For exam- 
ple, if actors A and C are of role X while actors B and D are of role Y, 
we may expect that the relationship from A to B is likely to be the same 
as relationship from C to D, because both of them are from a role-X actor 
to a role-y actor. In this sense, a role is like a class label in a classification 
task. However, under a mixed membership model, an actor can have differ- 
ent role instantiations when interacting with different neighbors in the same 
network. 

The role-compatibility matrix B = {(3k,i\ decides the affinity between 
roles. In some cases, the diagonal elements of the matrix may dominate 
over other elements, which means actors of the same role are more likely 
to connect to each other. In the case where we need to model differential 
preference among different roles, richer block patterns can be encoded in the 
role-compatibility matrix. The flexibility of the choices of the B matrix give 
rise to strong expressivity of the model to deal with complex relational pat- 
terns. If necessary, a prior distribution over elements in B can be introduced, 
which can offer desirable smoothing or regularization effects. 
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Crucial to our goal of role-prediction and role-evolution modeling for net- 
work data, is the so called "role vector" 7Tj of the mixed-membership co- 
efficients in the above generative model, which represents the overall role 
spectrum of each actor and succinctly captures the probabilities of an actor 
involving in different roles (or influenced by different socio-cultural factors) 
when he/she interacts with another actor. Much of the expressiveness of the 
mixed-membership models lies in the choice of the prior distribution for the 
mixed-membership coefficients 7?$, and the prior for the interaction coeffi- 
cients {Pkl}- F° r example, in Airoldi et al. (2005), a simple Dirichlet prior 
was employed because it is conjugate to the multinomial distribution over 
every latent membership label {z^.,^^.} defined by the relevant 7rV Here, 
to capture non-trivial correlations among the weights (i.e., the individual 
elements within 7?j) of all latent roles of a vertex, and to allow one to intro- 
duce dynamics to the roles of each actor when modeling temporal processes 
such as a cell cycle, we employ a logistic-normal distribution over a simplex 
as in Ahmed and Xing (2007). 

Under a logistic normal prior, the first sampling step for 7Tj = [71^1, . . . , tt^k] 
in the canonical mixed membership generative model above can be broken 
down into two sub-steps: first draw 7$ according to: 

(1) 7i ~ Normal(/J, S); 

then map it to the simplex via the following logistic transformation: 

(2) 7r i)fc = exp{7 I>fe - C(7i)}, Vk = l,...,K 

K 

(3) where C{%) = log( exp{7 Iifc }) . 

fe=i 

Here C{^i) is a normalization constant (i.e., the log partition function). 
Due to the normalizability constrain of the multinomial parameters, tti only 
has K — 1 degree of freedom. Thus we only need to draw the first K — 1 
components of 7* from a (K — l)-dimensional multivariate Gaussian, and 
leave j itK = 0. For simplicity, we omit this technicality in the forth coming 
general description and operation of our model. 

Under a dynamic network tomography model, the prior distributions 
of role weights of every vertex ptQ, and the role-compatibility matrix B, 
can both evolve over time. Conditioning on the observed network sequence 
, our goal is to infer the trajectories of role vectors 7rf ) in the 
latent social space or biological function space. In the following, we present 
a generative model built on elements from the classical state space model for 
linear dynamic systems and the mixed membership stochastic blockmodel 
described above for random graphs for this purpose. 
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3.2. Dynamic Logistic-Normal Mixed Membership Stochastic Blockmodel. 
We propose to capture the dynamics of network rewiring and evolution at 
the level of both the prior distributions of the mixed membership vectors of 
vertices, and the compatibility functions governing role-to-role relationships. 
In this way we capture the dynamic behavior of the generative system of 
both vertices and relations. Our basic model structure is based on the well- 
known state-space model (SSM) popular in object tracking and trajectory 
modeling, which defines a linear dynamic transformation of the mixed mem- 
bership priors over adjacent time points: 

(4) fl {t) = A^-V +w {t \ fort>l 

where represents the mean parameter of the prior distribution of the 
mixed membership vectors of all vertices at time t, and w w ~ A/"(0, $) 
represents normal transition noise for the mixed membership prior. The 
MMSB model defined above now functions as an emission model within 
SSM that defines the conditional likelihood of the network at each time 
point. Note that the linear system on /2 (i) can lead to a bursty dynamics 
for latent admixing vector irf through the logistic normal emission model. 
Starting from this basic structure, we propose to develop a dynamical model 
for tracking underlying functional changes in network elements, and sensing 
emergence and termination of " function themes" . 

Given a sequence of network topologies over the same set of nodes, here is 
an outline of the generative process under such a model (a graphical model 
representation of this model is illustrated in Figure 1): 
• State-Space Model for Mixed Membership Prior: 



sample the mean of the mixed membership 
prior at time 1. 



sample the means of the mixed member- 
ship priors over time. 



- /2 (1) ~ Normal(V, <£>), 

For t = 1,...,T: 

- /2 (t) = Normal(A/i (t - 1) ,<l'), 

• State-Space Model for Role-Compatibility Matrix: 

For k = 1, . . . , K and k' = 1, . . . , K, 

- (1) ~ Normalft i/0 sample the compatibility coefficient be- 
^ fe,fc ' ' tween role k and k' at time 1. 



For t = 1,...,T: 

^fe.fc' ~ Normal(&?7^, 1) ,i/'), 

(t) cxp (' 7 l t fc' ' compute compatibility probabilities via lo- 

Pk,k' = — 7TTT' 



sample compatibility coefficients over sub- 
sequent time points. 



exp(jj w ' gistic transformation. 



DYNAMIC NETWORK TOMOGRAPHY 



9 



• Logistic-Normal Mixture Membership 

For each node n = 1, . . . , N, at each time j 

- 7?f 5 ~ LogisticNormal ( fl {t) , E (i) ) 
For each pair of nodes £ [1, N] x [1, P 

- ~ Multinomial ( ) 

- z^) ~ Multinomial ( ^",1 ) 

- e^' ~ Bernoulli ( z£) 'B^z^) ) 



Model for Networks 

; t= 1,...,T: 

sample a k dimensional mixed mem- 
bership vector; 

sample membership indicator for 
the donor 

sample membership indicator for 
the acceptor 

sample the links between nodes. 




u (1 



(5 


f) 

,Vx,v 








T)\ 
( J 



Fig 1 . A graphical model representation of the dynamic logistic-normal mixed membership 
stochastic blockmodel. The part enclosed by the dotted lines is a logistic-normal MMSB. 



Specifically, we assume that the mixed membership vector tt for each actor 
follows a time-specific logistic normal prior £A/"(/I (t) , S (t) ), whose mean fl® 
is evolving over time according to a linear Gaussian model. For simplicity, 
we assume that the S (t) which captures time-specific topic correlations is 
independent across time. 

It is noteworthy that unlike a standard SSM of which the latent state 
would emit a single output (i.e., an observation or a measurement) at each 
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time point, the dMMSB model outlined above generates N emissions each 
time, one corresponding to the (pre-transformed) mixed-membership vector 
7- of each vertex. To directly apply the Kalman filter and Rauch-Tung- 
Striebel smoother for posterior inference and parameter estimation under 
dMMSB, we introduce an intermediate random variable = j? Ylili^'i h 
is easy to see that follows a standard SSM re-parameterized from the 
original dMMSB: 

(5) Y {t) ~ Normal (^ {t) , , t = 1 , . . . , T. 

In principle, we can use the above membership evolution model to capture 
not only membership correlation within and between vertices at a specific 
time (as did in Blei and Lafferty (2006a)), but also dynamic coupling (i.e., 
co-evolution) of membership proportions via covariance matrix In the 
simplest scenario, when A = I and $ = al, this model reduces to ran- 
dom walk in the role-mixing space. Since in most realistic temporal series 
of networks, both the compatibility functions between vertices, and the se- 
mantic representations of role-mixing are unlikely to be invariant over time, 
we expect that even a random walk mixed membership evolution model can 
provide a better fit of the data than a static model that ignores the time 
stamps of all networks. 

4. Variational Inference. Due to difficulties both in marginalization 
over the super-exponential state space of latent variables z, and in inte- 
gration of the role vectors tt under a logistic normal prior where a closed- 
form solution is unavailable, exact posterior inference of the latent vari- 
ables of interest, and direct EM estimation of the model parameters is 
infeasible. In this section, we present a variational approximation scheme 
based on the generalized means field (GMF) approximation (Xing et al., 
2003) to infer the latent variables and estimate the model parameters. The 
GMF approach is modular, that is, we can approximate the joint posterior 
p({z (t) , 7r (t) , /T (t) , B (t) }J =1 \@, {G (t) }f =1 ^ where denotes the model parame- 
ters, by a factored approximate distribution: 

(6) qfaW^BVyij = ft^^^}^)*^ }^)*.^*'}^), 

where q\Q can be shown to be the marginal distribution of {z (i) , 7f (t) }Jl 1 
under a reparameterized logistic normal MMSB, and q^O an d (ftQ are SSMs 
over {/I (t) }f =1 and {.B (t) }f =1 , respectively, with emissions related to expec- 
tation of {z (t) , 7r w }f =1 under gi(). This can be shown by minimizing the 
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Kulback Leibler divergence between q() and p() over arbitrary choices of 
<Zi()> Q2O and (73 (), as proven in (Xing et al., 2003). The computation of the 
variational parameters of each of these approximate marginals leads to a 
coupling of all the marginals, as apparent in the descriptions in the subse- 
quent subsections. But once the variational parameters are solved, inference 
on any latent variable of interest under the joint distribution p(), which is 
intractable, can be approximated by a much simpler inference on the same 
variable in one of the marginals that contains the variables of inter- 
est. Bellow we briefly outline solutions to each of these marginals of subset 
of variables, which exactly correspond to the three building blocks of the 
dMMSB model outlined in Section 3.2. (Since // (t) and B (t) both follow a 
standard SSM, for simplicity, we only show the solution to q<i () over /x (t) , 
and treat B < - t) as an unknown invariant constant B to be estimated.) 

4.1. Variational approximation to logistic-normal MMSB. For a static 
MMSB, the inference problem is to estimate the role- vectors given model 
parameters and observations. That is, model parameters fl, £ and B are 
assumed to be known besides the observed variables E, and we want to 
compute estimates of the role vectors 7. along with role indicators and 
z.t-.. (Under dMMSB, fi is in fact unknown, but we will discuss shortly how 
to estimate it outside of the MMSB inference detailed bellow.) 

Under the Logistic-Normal MMSB, ignoring time and vertex indices, the 
marginal posterior of latent variables 7 (the pre-transformed 7?) and z can 
be written as follows: 

(7) p(j.,z.^.,z.<-. I $,Y.,B,E) cx \[p{ji | /2,S) 

i 

Y[p(zi^j,Zi^j I Ji,7j)p(eij I Zi^j,Zi^j,B) 

Marginalization over all but one hidden variables to predict, say 7*j, is 
intractable under the above model. Based on the GMF theory, we ap- 
proximate p(7*., £— >., z.^. I /i, X, B, E) with a product of simpler marginals 
<?() = Q-tOQzO, each on a cluster of latent variable subset, i.e., {^i} and 
{z^j , Zi^j} . Xing et al. (2003) proved that under GMF approximation, the 
optimal solution, q(), of each marginal over the cluster of variables is iso- 
morphic to the true conditional distribution of the cluster given its expected 
Markov Blanket. That is, 

(8) g 7 (7i) =p(li I (&-■)«») 

(9) q z (zi^j,Zi^j) = p(zi->j, z^j I eij, B, (7^, (7j)g 7 ) 
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These equations define a fixed point for g 7 and q z . The optimal marginal 
distribution of the variables in one cluster is updated when we fix the 
marginal of all the other variables, in turn. The update continues until the 
change is neglect able. 

The update formula for cluster marginal over (z^j, j) is straightfor- 
ward. It follows a multinomial distribution with K x K possible outcomes 1 

(10) q z {zi^j,Zi^j) ozp(zi->j | (ji) qj ) p{zi^j | (ij)q-,) p{eij | z}^ , z\^j , B) 
~ Multinomial ((5^ ) 

where = \ exp((7 iiU )q 7 + (7j>)g 7 ) Pu,l (1 - Pu^) 1 '^ , and Z is the 

normalization function to keep J2(u,v) $ij(u,v) = 1- Furthermore, the expec- 
tation of z's according to the multinomial distribution are 

[LI) [Zi^ j:U ) gz - [Zj ^ i:V ) qz - 

Z-iu,v u zj(u,v) Z-tu,v u ij(u,v) 

The update formula for 73 can be derived similarly but some further ap- 
proximation is applied. First, 

(12) g T (7i) cxp(7i | /2,£) p((zi^.) qz , qz \ %) 

= Af(%; M, S) exp((m i ) g /f i - (2N - 2) C(f0) 

where m ifc = Y,j^ii z i^j,k + ^j,fe)> ("^ifc)g z = Ej^i((^j,fe)q z + { z i^j,k)q z ), 
and C(7i) = log(X)fcLi exp^^}). The presence of the normalization con- 
stant C(7i) makes g 7 un-integrable in closed- form. Therefore we apply a 
Laplace approximation to C(ffi) based on a second-order Taylor expansion 
around % (Ahmed and Xing, 2007), such that g 7 (7i) becomes a reparam- 
eterized multivariate normal distribution A/*(7i, Ej) (see Appendix A.l for 
details). In order to get a good approximation, the point of expansion, 7$, 
should be set as close to the query point as possible. Therefore, we set it 
to be the 7, obtained from the previous iteration, i.e. 7[ +1 = Jl where r 
denotes the iteration number. 

The inference algorithm iterates between Eq. 10 and Eq. 12 until conver- 
gence when the relative change of log-likelihood is less than 10 -6 in absolute 
value. The procedure is repeated multiple times with random initialization 
for 7j . The result having the best likelihood is picked as the solution. 



lr The K x K components are flatted into a one-dimension vector. 
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4.2. Parameter Estimation for Logistic-Normal MMSB. The model pa- 
rameters jl, E and B have to be estimated from data E. In the simplest 
case, where time evolution of jl and B is ignored, these can be done via a 
straightforward EM-style procedure. 

In the E step, we use the inference algorithm from Section 4.1 to compute 
the posterior distribution and expectation the latent variables by fixing the 
current parameters. In the M step, we re-estimate the parameters by max- 
imizing the log-likelihood of the data using the posteriors obtained from 
the E-step. Under a Logistic-Normal MMSB, exact computation of the log- 
likelihood is intractable, hence we use an approximation method known as 
variational EM. We obtain the following update formulas for variational 
EM (Ghahramani and Beal, 2001) (see Appendix A. 2 for details): 

(13) k,l = ^ e f m \ A=^£7i, £ = l£E i + Cov(7 1: *) 
The procedure for the learning can be summarized below: 



Learning for Logistic-Normal MMSB: 

1. initialize B ~ U[0, 1], ~ M(0,I), E = 10/ 

2. while not converged (Outer Loop) 

2.1. Initialize qlffi) 

2.2. while not converged and # iteration < threshold (Inner Loop) 

2.2.1. update q(zi-,j, Zi<-j) ~ Multinomial(5y) 

2.2.2. update q(%) ~ Af(% £<) 

2.2.3. update B 

2.3. update jl, E 



The convergence criterion is the same as in inference. It is worth noting 
that the update of role-compatibility matrix B is in the inner loop, which 
means that it is updated as frequently as mixed membership vectors 7$. This 
makes sense because the role-compatibility matrix and mixed membership 
vectors are closely coupled. 

4.3. Variational approximation to dMMSB. When jl is time-evolving as 
in dMMSB, two aspects in the algorithms described in Sections 4.1 and 4.2 
need to be treated differently. First, unlike in Eq. (13), estimation of 
now must be done under an SSM, with {7$ } as the emissions at every time 
point. Second, according to the GMF theorem, the [i appeared in all equa- 
tions in Section 4.1 must now be replace by the posterior mean of /2 (t) under 
this SSM. Bellow we first summarize the algorithm for dMMSB, followed 
by details of the update steps based on the Kalman Filter (KF) and the 
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Rauch-Tung-Striebel (RTS) smoother algorithms. 



Inference for dMMSB: 

1. initialize all fl (t} 

2. while not converged 

2.1. for each t 

2.1.1. call the inference algorithm for MMSB on network in §4.1 
(by passing to it all current estimate of /J (t) ), 

and return the GMF approximation 7 Z , 

2.1.2. update the observations, Y^ = Y^il^ l N 

2.2. RTS smoother update f}^ = jl t \ T based on [Y^}f =1 



Note that in the SSM, we set the state transition matrix A to /. Given all 
model parameters and all the emissions (the current estimate of the mixed 
membership vectors {7^ } of all vertices returned by the logistic normal 
MMSB at each time point), posterior inference of the hidden states /2 (t) can 
be solved according to the following KF and RTS procedure. The major 
update steps in the Kalman Filter are: 



fk+\\t — Afi t \t — fk\t 



P t+l \ t = AP t \ t A T + <D = P At + $ 
iT m = P m | i (P t+1 | t + S m /iV)- 1 

( 14 ) Am|m = h+i\t + K t+1 (Y t+1 - At +1 | t ) 

(15) P t+ i\ t +i = Pt+i\t ~ K t+1 P t+1[t 

where £t| s = E (/#*) \ Y lf . . . ,Y S ) and P t \ s = Var(/iW | Yi, . . . , Y s ). And the 
major update steps in the Rauch-Tung-Striebel smoother are: 

Lt = P t \t^ T P t +i\ t = p t\tP t +i\t 

(16) h\r = At|t + L t{h+i\T - h+i\t) 

(17) P t \ T = Pt\t + L t (P t+ i\T - Pt+i\t)L? 

4.4. Parameter Estimation for dMMSB. We again use the variational 
EM algorithm. The E-step uses the dMMSB inference algorithm in Sec- 
tion 4.3 for compute sufficient statistics /t t | r ,Vi, and the logistic normal 
MMSB inference algorithm in Section 4.2 for computing all sufficient statis- 
tics fifthly In the M-step, model parameters are updated by maximizing 
the log-likelihood obtained from the E-step. From this on, we simplify the 
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linear transition model posed on matrix B and assume that it is constant. 
We derive the following updates for the model parameters B, u, (See 
Appendix A. 3 for some details): 

V V e {t) S {t) 

(is) 4,* = £t ^%7 (M) 

^ T-l ^ T-l 

(19) 6 = -— £ (Am|T " fa\T? + TfT—r £ L t P t+1{T Lj 

t=l t=l 

(20) sW=£(A*|T-7f) 2 + £jf 

(21) z> = Ai|T 

The algorithm can be summarized below: 



Learning for dMMSB: 




1. initialize B ~ W[0, 1], v ~ Af(0,I), /Z^ 


= i/,$ = 10/, £W = 10/ 


2. while not converged 




2.1. Initialize all q{if) 




2.2. while not converged 




2.2.1. foreach t 




2.2.1.1. update q(zi^j, Zi<—j) 


~ Multinomial^) 


2.2.1.2. update qtfi) ~ A/^ 




2.1.2. update B 


based on {F W }f =1 


2.3. RTS smoother update, = /<t|T 


2.4. update ^ , $, 





Notice that in the above algorithm, the variational cluster marginals 
q(zi-,j, Zi^j), q{ji), q^pS 1 ', ■ ■ ■ , 0- T ') each dependents on variational param- 
eters defined by other cluster marginals. Thus overall the algorithm is es- 
sentially a fixed-point iteration that will converge to a local optimum. We 
use multiple random restarts to obtain a near global optimum. 

5. Experiments. 

5.1. Experiments on simulated data. In this section, we first investigate 
the effectiveness of the learning algorithm for Logistic-Normal MMSB using 
synthetic data. We generated three sets of synthetic data each of which 
has 100 individuals and 3 roles, and we used 3 different sets of priors on 
the role- vector and role-compatibility matrices, to mimic different real- life 
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Fig 2. The learning results of the MMSB for synthetic network I to III (from top to 
bottom). In the left column, the figures in each cell contain the role-vectors learned by 
the Logistic-Normal MMSB and the Dirichlet MMSB respectively. The role-vectors are 
projected onto a simplex along with the ground truth: a circle represents the position of a 
ground truth; a cross represents an estimated position; and, each truth- estimation pair is 
linked by a grey line. Note that we used to different colors to denote actors from different 
groups. In the right column, we display the role- compatibility matrices estimated by the 
the Logistic-Normal MMSB and the Dirichlet MMSB respectively. For both models, the 
estimated role- compatibility matrices are both close to the true matrices we used to generate 
the synthetic networks. 

situations. Figure 2 shows our simulation results. For comparison, we also 
produced results using the Dirichlet MMSB. 

For synthetic network I, most actors have a single role and the role- 
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Fig 3. The average distance in (top) L-l and (bottom) L-2 between the ground truth and 
the estimation of the mixed membership vectors in networks that share parameter settings 
as simulation network I, II and III (from left to right). 



compatibility matrix is diagonal which means that actors connect mostly 
with other actors of the same role. It can be seen that the mixed member- 
ship vectors are well recovered by both models. Most of the actors in the 
simplex are close to a corner, which indicates that they have a dominating 
role. Some actors are not close to a corner but close to an edge, which means 
that they have strong memberships for two roles. The remaining actors lying 
near the center of the simplex have mixed memberships for all three roles. In 
general, the difficulty of recovering the mixed membership vector increases 
as an actor possesses more roles. 

In synthetic network II, the true mixed membership vector is qualitatively 
similar to synthetic network I, but the role-compatibility matrix contains off- 
diagonal entries. At a result, an actor in network II is more likely to connect 
with actors of a different role than network I. In this more difficult case, 
our model still accurately estimates the role-compatibility matrix and the 
mixed membership vectors. 

In synthetic network III, we present a very difficult case where many ac- 
tors undertake noticeable mixed roles, and the within-role affinity is very 
weak. Though a few actors near the center of the simplex endure obvious 
discrepancy between the truth and the estimation, less than 10 percent ac- 
tors have a more than 20 percent errors in their role vectors. Furthermore, 
we can see the group structure is still clearly retained. The prediction by 
the Dirichlet MMSB is only slightly different, but the average error rate is 
still close. 

To provide a quantitative comparison between the logistic normal MMSB 
and the Dirichlet MMSB, we compute the average distance between the 
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Fig 4. Posterior mixed membership vectors of the monks projected in a 2-simplex by Log- 
Normal MMSB with 3 roles. Numbered points can be mapped to monks' names using the 
legend on the right. Colors identify the composition of role-vectors. 



ground truth and the estimated mixed membership vectors in the aforemen- 
tioned three settings. We used both L-l and the L-2 distance as the metrices 
in our comparison, and the results are shown in Figure 3. We instantiate each 
network fro ten times to produce the error bar in Figure 3. We can see that 
the Logistic-Normal MMSB performs slightly better for network I and II 
(thought no significant difference is observed). 

5.2. Application to Sampson's monk network: emerging crisis in a cloister. 
Sampson (1969) recorded the social interactions among a group of monks 
while being a resident in a monastery. He collected a lot of sociometric 
rankings on relations such as liking, esteem, praise, etc. Toward the end of 
his study, a major conflict outbroke followed up by a mass departure of the 
members. The unique timing of the study makes the data more interesting 
in the attempt to look for omens of the separation. 

We analyze the networks of liking relationship at three time points. They 
contain 18 members (only junior monks). The networks are directed rather 
than undirected, because one can like another while not vice versa. 

We start with a static analysis on the network of time point 3, which 
is the latest record before the crisis. Several researchers have also studied 
the static network, including Breiger et al. (1975), White et al. (1976) and 
Airoldi et al. (2008). 

The network is fitted by our model with from 1 role to 5 roles. The 
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proper number of roles is selected by Bayesian Information Criterion (BIC). 
Figure 5(b) gives the BIC scores. It suggests that the model with 3 roles is 
the best. 

Figure 4 shows the posterior role- vectors of the monks in the monk liking 
networks by Log-Normal MMSB with three roles. It clearly suggests three 
groups, each of which is close to one vertex of the triangle. Using Sampson's 
labels, the three groups correspond to the Young Turks (monks numbered 1, 
2, 7, 12, 14, 15, 16), the Loyal Opposition (4, 5, 6, 9, 11) + Waverers (8, 10), 
and the Outcasts (3, 17, 18) + Waverer (13). The result is consistent with 
all previous works except for a controversial person, Mark (13). He is known 
as an interstitial member of the monastery. Breiger et al. (1975) placed him 
with the Loyal Opposition, whereas White et al. (1976) and Airoldi et al. 
(2008) placed him among the Outcasts as we do. 

Figure 5(a) provides the estimated role-compatibility matrix. It appears 
that the inter-group relation of liking is strong, while the intra-group relation 
is absent. Together with the fact that most of the individuals have an almost 
pure role, it suggests that an explicit boundary exists between the groups 
leaving the later separation no surprise. 

The trajectories of the varying role-vectors over time inferred by dMMSB 
with three roles are illustrated in Figure 6. Several big changes in role- vectors 
happened from time 1 to time 2, and some minor fluctuation occurred be- 
tween time 2 and time 3. Overall, most persons were stable in the dominant 
role. If we only look at time 3 which is the one we studied earlier in the 
static network analysis, the results of mixed membership and grouping of 
the two models are mostly consistent. Therefore, according to the discussion 
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Fig 6. The role-vectors learned in the dynamic network of liking relationship between 
members in Sampson Monastery. Each color represents a role. 

in the static network analysis, the three roles in the dynamic model can be 
roughly interpreted as Young Turks, Loyal Opposition, and Outcasts. 

One of the persons whose dominant role changed is Ambrose (3). He later 
became an Outcast. However, at time 1, he was connected with both Romul 
(1) and Bonaven (2) in the Young Turks besides his connection with Elias 
(17), an Outcast. It supports our result viewing him mainly as a Young Turk 
at the time. The other two persons are Peter (5) and Hugh (11). They were 
close to some Outcasts at time 1 but flipped to Loyal Opposition at time 
2 where they finally belonged to. It suggests that the Outcast group whose 
member finally got expelled had not been noticeably formed until after these 
big changes happened between time 1 and time 2. 

From time 2 to time 3, it can be observed that the role-vectors were 
purifying, for instances, in monks numbered 1, 3-10, 12, 15-17. Bonaven (2) 
and Albert (14) were the exceptions, but they did not change the general 
trend. The purifying process indicated that the members of different groups 
were more and more isolated, which finally led to the outbreak of a major 
conflict. 

5.3. Application to Gene Regulatory Network of Drosophila Melanogaster. 
In this section, we studied a sequence of gene regulatory networks of Drosophila 
melanogaster sampled at various point of its life cycle. Over the develop- 
mental course of Drosophila melanogaster, there exist multiple underlying 
"themes" that determine the functionalities of each gene and their relation- 
ships to each other, and such themes are dynamical and stochastic. As a re- 
sult, the gene regulatory networks at each time point are context-dependent 
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and can undergo systematic rewiring, rather than being invariant over time. 
In a seminal study by Luscombe et al. (2004), it was shown that the "ac- 
tive regulatory paths" in the gene regulatory networks of Saccharomyces 
cerevisiae exhibit topological changes and hub transience during a temporal 
cellular process, or in response to diverse stimuli. We expect similar prop- 
erties can also be observed for the gene regulatory networks of Drosophila 
melanogaster. 

We reverse engineered the sequence of gene regulatory networks based 
on a time series of microarray measuremetns collected by Arbeitman et al. 
(2002). In the microarray experiments, the expression levels of 4028 genes 
are simultaneously measured at 66 different time points across various de- 
velopmental stages, namely embryonic stage (1-30 time point), larval stage 
(31-40 time point), pupal stage (41-58 time points) and adult stages (59-66 
time points). In this study, we focused on 588 genes that are known to be 
related to developmental process based on their gene ontologies. 

Usually, the samples prepared for microarray experiments are a mixture of 
cells with possibly different expression levels, which means that microarray 
experiments only provide rough estimates of the average expression levels. 
Other sources of noise can also be introduced into the microarray mea- 
surements during, for instance, the stage of hybridization and digitization. 
Therefore, it is more robust if we only model the binary state of the the 
gene expression: either being up-regulated or down-regulated. For this rea- 
son, we transform the gene expression levels into a time series multivariate 
binary observations (-1 for down-regulated and 1 for up-regulated). Due to 
the non-iid. nature of these multivariate binary observations, we model them 
as a sequence of binary Markov Random Fields (MRFs) where the nonzero 
parameters correspond to edges in the dynamic networks. We estimated 
these edges using an t\ regularized kernel reweighting approach in a similar 
manner as Zhou et al. (2008). 

To provide an overview of the dynamic gene regulatory networks, we plot- 
ted two network statistics of as a function of the developmental time point 
(1-66) in Figure 7. The first statistic is the network size as measured by the 
number of edges; and the second is the average local clustering coefficient as 
defined by Watts and Strogatz (1998). For comparison, we normalized both 
statistics to the range between [0,1]. It can be seen that the network size 
and its local clustering coefficient follow very different trajecotories along 
the developmental cycle. The network size exhibits a wave structure fea- 
turing two peaks at mid-embryonic stage and the beginning of pupal stage. 
Similar pattern of gene activity has also been observed by Arbeitman et al. 
(2002). In contrast, the clustering coefficients of the dynamic networks drop 
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Fig 7. (a) Evolution of the network statistics, (b) Example network at mid-embryonic stage 
(time point 15) and (c) at mid-pupal stage (time point 49). Although the two networks are 
comparable in term of their network size, the network at mid-embryonic stage has a better 
local clustering structure as can be seen by comparing (b) and (c). 



sharply after the mid-embryonic stage, and they stays low until the start 
of the adult stage. One explanation is that at the beginning of the devel- 
opment process, genes have more fixed and localized functional roles, and 
they mainly interact with other genes with similar functions; however, after 
mid-embryonic stage, genes become more versitile of the dynamic networks 
drop sharply after the mid-embryonic stage, and they stays low until the 
start of the adult stage. One explanation is that at the beginning of the 
development process, genes have more fixed and localized functional roles, 
and they mainly interact with other genes with similar functions; however, 
after mid-embryonic stage, genes become more versitile and involved in more 
diverse roles to serve the need of rapid development; as the organism turns 
adult, its growth slows down and each gene restored to their more specialized 
role. To illustrate how the network properties change over time, we visual- 
ized two networks from mid-embryonic stage (time point 15) and mid-pupal 
stage (time point 45) in Figure 7(b) and 7(c) respectively. Although the size 
of the two networks are comparable, we can see that there are much clearer 
local clusters of interacting genes during mid-embryonic stage. 

We used dMM SB to analyze a subset of 22 networks which are constructed 
by summing the original networks from every 3 consecutive time points. We 
plotted the mixed membership vector for each gene as it varies across the 
developmental cycle (Figure 8). From the time courses of these mixed mem- 
bership vectors, we can see that many genes assume very different roles 
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during different stages of the development cycle. Particularly, we see that 
many genes exhibit sharp transition in term of their roles near the end of 
embryonic stage. This is consistent with the underlying developmental re- 
quirement of Drosophila that the gene interaction networks need to undergo 
a drastic reconfiguration to accommodate the new stage of larval develop- 
ment. 

We also selected four genes for further studies, namely Optix, dorsal (dl), 
lethal (2) essential for life (l(2)efl) and tolkin (tok). These four genes are 
among the highest degree nodes in the network produced by averaging the 
dynamic networks over time. We want to see how their roles evolve over time, 
and therefore we plotted the trajectory of their mixed membership vector in 
a 4-d simplex (Figure 9). We can see that the trajectory of these genes cover 
a wide area of the 4-d simplex. This is consistent with the roles of gene Optix 
and dl as transcriptional factors that participate in many different functions 
and regulates the expression of a wide range of other genes. For instance, dl 
participates in a diverse range of functions such as anterior /posterior pattern 
formation, dorsal/ ventral axis specification, immune response, gastrulation, 
heart development; Optix participates in nervous system and compound eye 
development. In contrast, gene tok and l(2)efl are not transcriptional factors 
and they are currently only known for very limited functions: tok is related 
to axon guidance and wing vein morphogenesis; l(2)efl related to embryonic 
and heart development. Our results suggests that these genes may play more 
diverse roles than what is current known and deserve further experimental 
studies. 

We further used the mixture membership vectors as features to cluster 
genes at each time point into 4 clusters (each cluster corresponding to a role), 
and studied the gene functions in each role across time. In other words, we 
try to provide a functional decomposition for each role obtained from the 
dMMSB model and investigate how these roles evolve over time. In partic- 
ular, we used 45 ontological groups and plotted the functional enrichment 
from Figure 10 to Figure 17. The overall pattern emerge from these fig- 
ures is that each role consists of genes with a variety of functions, and the 
functional composition of each role varies across time. However, the distri- 
butions over these function groups are very different for different roles: the 
most commmon functional groups for genes in role 1 are related to multicel- 
lular organismal development, cuticle development and pigmentation during 
development; for the second role, the most common functional groups are 
gland morphogenisis, heart development, gut development and ommatidial 
rotation; for the third role, they are stem cell maintenance, sensory organ 
development, central nervous system development, lymphloid organ devel- 
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Fig 8. Changes in mixed membership vectors grouped by individual. The x-axes of each 
subplot is time, and the y-axes is the weight of role- component. Each color stands for a 
role. 
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Fig 9. The trajectories of role-vectors of 4 genes (Optix, dl, tok, l(2)efl) over the 22 time 
points. 
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opment and gland development; for the fourth role, gastrulation, multicel- 
lular organismal development, gut development, stem cell maintenance and 
regionalization. 
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Fig 10. Average gene ontology (GO) enrichment score for role 1. The enrichment score 
for a given function is the number of genes labeled as this function. Note that in the plot 
we have normalized the score to a range between [0, 1], since we are mainly interested in 
the relative count for each GO group. Abbreviations appearing in the figure are: dev. for 
development, proc. for process, morph. for morphogenesis, sys. for system. 
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Fig 11. Temporal evolution of gene ontology enrichment score for role 1. The 22 time 
points are ordered from left to right, and from top to bottom. The order of the gene ontology 
groups are the same as in Figure 10. 
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Fig 12. Average gene ontology (GO) enrichment score for role 2. The enrichment score 
for a given function is the number of genes labeled as this function. Note that in the plot 
we have normalized the score to a range between [0, 1], since we are mainly interested in 
the relative count for each GO group. Abbreviations appearing in the figure are: dev. for 
development, proc. for process, morph. for morphogenesis, sys. for system. 
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Fig 13. Temporal evolution of gene ontology enrichment score for role 2. The 22 time 
points are ordered from left to right, and from top to bottom. The order of the gene ontology 
groups are the same as in Figure 12. 
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Fig 14. Average gene ontology (GO) enrichment score for role 3. The enrichment score 
for a given function is the number of genes labeled as this function. Note that in the plot 
we have normalized the score to a range between [0, 1], since we are mainly interested in 
the relative count for each GO group. Abbreviations appearing in the figure are: dev. for 
development, proc. for process, morph. for morphogenesis, sys. for system. 
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Fig 15. Temporal evolution of gene ontology enrichment score for role 3. The 22 time 
points are ordered from left to right, and from top to bottom. The order of the gene ontology 
groups are the same as in Figure 14- 
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Fig 16. Average gene ontology (GO) enrichment score for role 4- The enrichment score 
for a given function is the number of genes labeled as this function. Note that in the plot 
we have normalized the score to a range between [0, 1], since we are mainly interested in 
the relative count for each GO group. Abbreviations appearing in the figure are: dev. for 
development, proc. for process, morph. for morphogenesis, sys. for system. 
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Fig 17. Temporal evolution of gene ontology enrichment score for role 4- The 22 time 
points are ordered from left to right, and from top to bottom. The order of the gene ontology 
groups are the same as in Figure 16. 
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6. Discussion. Our dynamic mixed membership stochastic block model 
(dMMSB) has several distinctive features. First, the social or biological roles 
in our model are not independent of each other and they can have their own 
internal dependency structures; Second, an actor in the network can be 
fractionally assigned to multiple roles; And third, the mixed membership of 
each actor is allowed to vary temporally. These features provides us extra 
expressive power to better model networks with rich temporal phenomena. 

In practice, this increased modelling power also provide better fit to net- 
works in reality. For instance, the interactions between genes underlying the 
developmental course of an organism are centered around multiple themes, 
such as wing development and muscle development, and these themes are 
tightly related to each other: without the proper development of muscle 
structures, the development and functionality of wings can not be fulfiled. 
As an organism moves along its developmental cycle, the underlying themes 
can evolve and change drastically. For instance, during embryonic stage of 
Drosophila, wing development is simply not present and other processes such 
as the specification of anterior/posterior axis may be more dominant. Many 
genes are very versitle in term of their roles and they differentially interact 
with different genes dependeding on the underlying developmental themes. 
Our model is able to capture these various aspects of the dynamic gene in- 
teraction networks, and hence leads us a step further in understanding the 
biological processes. 

In term of the algorithm, a key ingredient to glue the three features to- 
gether is the logistic normal prior for the mixed membership vector. This 
prior is superior to a Dirichlet prior in our context since the off-diagonal 
entryies of the covariance matrix allow us to code the dependency structure 
between roles. Another advantage of the logistic normal prior is that it can 
be readily coupled with a Kalman Filter for tracking the evolution of the 
roles. However, the drawback of the logistic normal prior is that it is not 
a conjugate prior to the multinomial distribution and therefore additional 
approximation is needed during learning and inference. For this purpose, we 
developed an efficient Laplace variantional inference algorithm. 

There are many dimensions where we can extend our current work. For 
instance, the current model does not explicitly take hubs and cliques of 
the networks into account, and the Kalman filter does not enforces tempo- 
ral smoothness directly over the mixed membership vector but only on its 
prior. Incorporating these elements into our model will be interesting future 
research. 
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A.l. Taylor Approximation. We want to approximate C(ji) by a 
second order Taylor expansion. For simplicity, we temporarily drop the sub- 
script i in this subsection. The Taylor expansion of C(t) w.r.t. any point 7 
is 

(22) C(7) « C( 7 ) + 5 T (7 " 7) + ^7 " 7) T # (7 - 7) 

where 5 is the first derivative (a K x 1 vector), and H is the second derivative 
(a K x K matrix). Only linear and quadratic terms are left. Therefore, 
Equation 12 becomes 

g 7 (7) oc AA(t; ft, E) exp«m} 9 /7 - (27V - 2) £(7)) 
« exp {-^(7 - ftf^ij - ft) + A7 + 7 T B 7 | 

where A = (m)£ - (2iV - 2)^ + (2iV - 2) 7 T i? isalxif vector and 5 = 
— (N — 1)H is a A" x K symmetric matrix. 



Let x = 7 — /2,the exponent becomes 

;(7 - aQ T E _1 (7 - A*) + ^7 + 7 T ^7 



2 

= - -x T 'E~ 1 x + A(x + /2) + (x + ft) T B(x + /I) 

= - \ xT {^ 1 ~ 2B )x + {A + 2ft T B)x + Ci 
(and let IT 1 = XT 1 - 2B, D = A + 2jf B) 

----x T t- l x + Dx + Ci 
2 

= - -{x - £d t ) t £- x {x - tD T ) + c 2 

-- - i(f - /2 - S J D T ) T S" 1 (7 - /Z - £D T ) + C 2 



Therefore, E = (S" 1 - 2B)' 1 = (iT 1 + (2JV - 2)H) * 
7 = ft + ED T = // + S(A T + 2B/2) 
= ft + z({rhi) qz - (2N - 2)g + (2N - 2)H% - (2N - 2)Hft 
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where the first and the second derivatives are: 

exp % 



9(7) 



k 



H(7) 



kl 



Efc ex P Ik 

I(k = 1) exp 7^ exp 7/ 



Efc ex P7fc (Efcexp7 fc ) 2 
or in short, H = diag(g) — gg T 

A. 2. Learning on Logistic-Normal MMSB. The log-likelihood as 
a function of B can be written as: 

1(B) = 5>gE (%a-.,:i:,(l - Pk,if~ eii) ) + Co 

i,j k,l 

<EE <Wo lQ g (>3 ( x - W (1_etf) ) + c o 

= E E ( e *i lo g A,/ + C 1 - e ii) iogl 1 - + Co 

i,j k,l 

(23) =r(5) 

dl*(B) / e ,j 1 - ej 



(24) /Sfci 



EiJ e ij^ij,(k,l) 
$ij,(k,l) 



Jensen's Inequality is applied in the derivation to get an approximation 
(more specifically an upper bound) to the log-likelihood which has an ana- 
lytical solution in finding the maximum point. Setting the derivative to zero 
gives us an MLE estimator of B based on approximation. 

A. 3. Learning on dMMSB. Again, we take an approximation of the 
log- likelihood, which is more tractable. 



KB) = EE lo §E OSko/fi C 1 - M^) + Co 

t i,j k,l 

^eeeCcW^V/m^V^ 

t i,j k,l 

?(*) (Jfi^n. . , ( 1 JtV 



E E E H' log (3 k , + (1 - ) iog(i - k>l )) + Co 

t ij k,l 



(25) =/*(S) 
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The update equation for B is got from maximizing the upper bound of the 
log- likelihood. 



1 " Pk,l 
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